A multilingual text mining approach to web cross-lingual text retrieval
نویسندگان
چکیده
To enable concept-based cross-lingual text retrieval (CLTR) using multilingual text mining, our approach will first discover the multilingual concept–term relationships from linguistically diverse textual data relevant to a domain. Second, the multilingual concept–term relationships, in turn, are used to discover the conceptual content of the multilingual text, which is either a document containing potentially relevant information or a query expressing an information need. When language-independent concepts hidden beneath both document and query are revealed, concept-based matching is made possible. Hence, concept-based CLTR is facilitated. This approach is employed for developing a multi-agent system to facilitate concept-based CLTR on the Web. q 2004 Elsevier B.V. All rights reserved.
منابع مشابه
Towards Web Mining of Query Translations for Cross-Language Information Retrieval in Digital Libraries
This paper proposes an efficient client-server-based query translation approach to allowing more feasible implementation of cross-language information retrieval (CLIR) services in digital library (DL) systems. A centralized query translation server is constructed to process the translation requests of cross-lingual queries from connected DL systems. To extract translations not covered by standa...
متن کاملMining bilingual topic hierarchies from unaligned text
Recent years have seen an exponential growth in the amount of multilingual text available on the web. This situation raises the need for novel applications for organizing and accessing multilingual content. Common examples of such applications include Multilingual Topic Tracking, Cross-Language Information retrieval systems etc. Most of these applications rely on the availability of multilingua...
متن کاملEnglish-Persian Plagiarism Detection based on a Semantic Approach
Plagiarism which is defined as “the wrongful appropriation of other writers’ or authors’ works and ideas without citing or informing them” poses a major challenge to knowledge spread publication. Plagiarism has been placed in four categories of direct, paraphrasing (rewriting), translation, and combinatory. This paper addresses translational plagiarism which is sometimes referred to as cross-li...
متن کاملExploiting the Web as the multilingual corpus for unknown query translation
Users’ cross-lingual queries to a digital library system might be short and the query terms may not be included in a common translation dictionary (unknown terms). In this paper, we investigate the feasibility of exploiting the Web as the multilingual corpus source to translate unknown query terms for cross-language information retrieval in digital libraries. We propose a Web-based term transla...
متن کاملModern Multilingual and Cross-lingual Information Access Technologies
In this chapter, we describe the state of the art cross-lingual and multilingual strategies and their related areas. In particular, we show a WWW-based information system called MIETTA, which allows uniform and multilingual access to heterogeneous data sources in the tourism domain. The design of the search engine is based on a new cross-lingual framework. The framework integrates a cross-lingu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Knowl.-Based Syst.
دوره 17 شماره
صفحات -
تاریخ انتشار 2004